Genome insights from the identification of a novel Pandoraea sputorum isolate and its characteristics

In this study, we sequenced a bacteria isolate Pandoraea sp. 892iso isolated from a Phytophthora rubi strain which is an important plant pathogenic oomycete, identified through genome and combined the data with existing genomic data from other 28 the genus of Pandoraea species. Next, we conducted a comparative genomic analysis of the genome structure, evolutionary relationships, and pathogenic characteristics of Pandoraea species. Our results identified Pandoraea sp. 892iso as Pandoraea sputorum at both the genome and gene levels. At the genome level, we carried out phylogenetic analysis of single-copy, gene co-linearity, ANI (average nucleotide identity) and AAI (average amino acid identity) indices, rpoB similarity, MLSA phylogenetic analysis, and genome-to-genome distance calculator calculations to identify the relationship between Pandoraea sp. 892iso and P. sputorum. At the gene level, the quorum sensing genes ppnI and ppnR and the OXA-159 gene were assessed. It is speculated that Pandoraea sp. 892iso is the endosymbiont of the Oomycetes strain of Phytophthora rubi.

Reflecting on previous research, Pandoraea spp. have frequently been misidentified in many clinical laboratories, leading to a lack of clinical documentation on their virulence potential. Therefore, it is important to accurately identify Pandoraea spp.. Earlier classification of prokaryotes was based solely on phenotypic similarities [22], but modern prokaryote characterization has been strongly influenced by advances in genetic methods. One criterion to be considered a species is to be essentially a collection of types that are characterized by at least one diagnostic phenotypic trait and to have purified DNA molecules that show at least 70% cross-hybridization (DNA-DNA hybridization, DDH) [22][23][24][25]. This is pragmatic and universally applicable within the bacterial domain, while the lack of this standard has been increasingly found when it comes to reliable diagnosis of infectious disease agents, international regulations for transport, quarantine, and so on [26][27][28]. Subsequently, this parameter has been applied most frequently in species identification at the whole genome level [29][30][31][32]. Genome Blast Distance Phylogeny (GBDP) [33], the core and pangenome [32], and the genomic-distance index based on DNA maximal unique matches (MUM) [34] are used to identify new species. Unfortunately, our understanding of Pandoraea spp. at the genomic level is relatively superficial, whereby the majority of the literature focuses principally on the usage of genotypic data to facilitate accurate genus-and species-level identification and secondarily on biotechnological potential [1,2,18,21].
In the present study, suspected bacteria isolated from an oomycete strain was identified through whole genome sequencing. The taxonomic status of this isolate was verified at the genome and gene levels, and its phylogenetic relationship with similar species was explored using indices, such as ANI/AAI, MLSA (Multi-locus Sequence Analysis) phylogenetic analysis, genome-to-genome distance calculations, quorum sensing, and oxacillinase gene analysis.

Strains, cultures, and DNA extraction
When we performed morphological observations on the hyphae of a Phytophthora rubi strain (No. 109892) from Westerdijk Fungal Biodiversity Institute, we inadvertently discovered the structure of suspected bacteria present in the mycelia. The structure still existed after the isolation by monofilament isolation and monospore isolation of the fungus. After isolation and culture, we obtained an analytical strain of bacteria, so that part of the name of which is called "892iso isolate". Separation, purification, and culture were carried out on beef extract peptone medium plates at 30˚C for 48 h. A TIANamp Bacteria DNA Kit (Tiangen, China) was used for genomic DNA.

Sequencing, assembly, and annotation
The whole genome was sequenced and assembled by a strategy that combined paired-end and mate-paired libraries. One targeted insert size of 500 bp was constructed using the TruSeq Nano DNA LT Library Prep Kit (Illumina, USA). One mate-paired library (2 kb) was constructed by the Nextera Mate Pair Sample Prep Kit (Illumina, FC-132-1001, USA) on the Illumina HiSeq 2500 platform. SOAPdenovo (v2.04) was used for de novo assembly. The assembled genome was annotated with a web-based tool called RAST (http://rast.nmpdr.org). RAST can identify repeat sequences in the genome, protein-encoding rRNA and tRNA genes, and assign functions to the genes.

Whole genome alignment and some indices calculation
Mauve (version 2.3.1) was used to align genomes for synteny analysis. The calculation of ANI and AAI was based on BLAST alignment results using a Perl script. The genome-togenome distance calculator calculations were based on a web server (https://ggdc.dsmz.de/) that uses multi-FASTA files as input. The ppnI/ppnR genes of P. pnomenusa were download from NCBI (accession ID KF887500.1 and KF900148.1), then aligned with all Pandoraea gene sets, all matches with the identity greater than 0.3 and score greater than 100 were retained. The ppnI candidates should contain PF00765 domain and ppnR candidates contain PF03472 domain, and the candidate pairs should be adjacent to each other. An intrinsic Carbapenem-Hydrolyzing Oxacillinases gene of Pandoraea sp. HD7676 was download from NCBI (accession ID: KP771987.1). BLAST was employed to identify homolog genes in the 28 Pandoraea species.

Comparative genome analysis
All protein sequences in reference genomes were downloaded and set as the query for all-vs-all BLASTP. OrthoMCL (version 2.0.8) was used to identify single-copy genes with I (inflation) set at 1.5. Next, MUSCLE (version 3.8.425) was used to align the sequences of the associated proteins. PAL2NAL (version 14.0) was used to convert the protein alignment to codon alignment. Gblock (version 0.91b) was used to remove the alignment results that were deemed unreliable. The phylogenetic tree was built by single-copy genes, with Burkholderia cepacia strain LO6 as the outgroup. MCMCTree software in PAML (version 4.7) was used to estimate the divergence time. CAFÉ (version 4) was used to calculate the expansion and contraction of these gene families.

Genome assembly, annotation, and validation of protein-coding genes
The genome of the Pandoraea sp. 892iso isolate was assembled from sequencing data generated by HiSeq 2000 by SOAPDenovo2 assembler. The total length of the top 48 longest scaffolds was 5.83 Mb, representing approximately 82.6-fold genome sequence coverage. The N50 and maximum lengths of scaffolds was 1.43 kb. Most of the length was concentrated on 12 scaffold sequences over 1,000 bp, of which the longest sequence was 2.06 MB (Fig 1). A total of 5,367 protein-coding genes were predicted from the genome assembly, 5,131 (95.60%) of which were supported by the RNA-seq data (coverage > = 90%). Within these protein-coding genes, 4,274 (79.63%) were assigned a biological function. Among the 1,093 ORFs without known function, 736 showed similarity to other database entries. For Pandoraea sp. 892iso isolate, the coding regions from the predicted genes constituted 88.61% of the genome (total length of all genes divide the genome size) and the average gene density was 919 genes per 1 Mb (total number of all genes divide the genome size, times with 100000bp), which were more or fewer than most of other sequenced Pandoraea species. The GC content of the genome, coding sequences, and repetitive elements were 62.66%, 63.32%, and 57.52%, respectively. A total of 63 tRNA genes were predicted from the assembly. The genome characteristics of Pandoraea sp. 892iso and other Pandoraea species are shown in Table 1.

Comparative genomics and identification at the genome level
Comparative genomic analysis. A total of genes in Pandoraea sp. 892iso were classified through cluster analysis. The distribution of best hits within the genus Pandoraea is shown in Fig 2. In total, 3,849 orthologous genes were shared in common between Pandoraea sp. 892iso and the other four Pandoraea species. The cluster analysis of Pandoraea sp. 892iso, 28 Pandoraea species, and Burkholderia cepacia as an outgroup was carried out by orthoMCL to obtain the result of a common single-copy gene family. The phylogenetic relationship in view of these single-copy genes is shown in Fig 3 and S1 Table, which shows the closest phylogenetic relationship to be between Pandoraea sp. 892iso and the P. sputorum strain DSM21091.

PLOS ONE
Meanwhile, eight specific gene families, including 21 genes, were clustered, 17 genes were hypothetical proteins, and the other four are shown in Table 2.
The global genome clustering and alignment of Pandoraea types were complicated by Mummer. The results showed the best gene co-linearity among these Pandoraea types and that rearrangement was almost absent, except for P. thiooxydans and P. sputorum, which were phylogenetically closest to Pandoraea sp. 892iso (S1 Fig). It was speculated that the small external selection pressure of the Pandoraea group and the genome evolution occurred in a similar way. More attention should be given to Pandoraea sp. 892iso and its proximal P. sputorum, both of which rearranged compared to other Pandoraea types. Rearrangements existed in the database, which may be related to the function of covalent attachment to other cellular proteins associated with stability changing, localization, and activity of the target protein [35]. The ubiquitin gene in Pandoraea sp. 892iso was found to be different from that in human, mouse, zebrafish, rice, Arabidopsis, yeast, or other model organisms by phylogenetic analysis (Fig 4).
ANI and AAI. ANI (average nucleotide identity), as the new method for bacterial species definition, provides several benefits, avoids misplacement based on phenotypic similarities or chemical characteristics, provides a scalable and uniform approach that works for both culturable and nonculturable species, is faster and cheaper than traditional taxonomic methods, and, most importantly, falls in line with Darwin's vision of classification [30]. AAI (average amino acid identity), a method that compares all conserved protein-coding genes present in a given set of genomes, clusters types into groups that share more than 95% AAI [36]. ANI and AAI characteristics have been used to evaluate the accuracy of these genotypic methods in the identification of Pandoraea species. Given the availability of whole genome sequence data and Pandoraea sp. 892iso nucleotide and amino acid data as query, Blastn by CDS sequence coverage was � 50% and tblastn by protein coverage was � 70%. We performed sequence-based genotypic microbial identification analysis using the RefSeq database by genome comparison between Pandoraea sp. 892iso and Pandoraea sputorum and generated an ANI value of 98.81% and an AAI value of 91.18%; genome comparison with other in-house sequenced Pandoraea species provided an ANI value of less than 93.34% and an AAI value of 84.90% (Table 3). Based on previous results using the ANI value for species definition, ANI and AAI values of � 95% corresponded to the traditional 70% DNA-DNA. Using the ANI and AAI values of Pandoraea sp. 892iso, it can be unequivocally stated that Pandoraea sp. 892iso is phylogenetically close to P. sputorum. rpoB similarity and MLSA phylogenetic analysis. The rpoB gene, encoding the β-subunit of RNA polymerase, has emerged as a core gene candidate for phylogenetic analyses and identification of bacteria; it is a single-copy gene, belongs to the common set of genes, and is long enough to contain phylogenetically useful information for some bacterial declination [37][38][39][40]. Multilocus sequence analysis (MLSA) is a currently widely used method for prokaryotic taxonomy, which utilizes internal fragments of several protein-coding genes. It was introduced by Gevers et al. and is increasingly being applied to obtain higher resolution power among species within a genus [39,41]. As a typing technique for type characterization that shows variation in multiple housekeeping genes, a concatenation of five housekeeping genes, shikimate dehydrogenase (aroE), guanylate kinase (gmk), phosphate acetyltransferase (pta), triosephosphate isomerase (tpi), and acetyl coenzyme A acetyltransferase (yqiL), was recommended for our bacterial delineation, as well as for clarifying the taxonomic situation within the Pandoraea family [39,41]. The phylogenetic tree topologies of Pandoraea sp. 892iso and other Pandoraea spp. by rpoB similarity (Fig 5A) and MLSA analysis (Fig 5B) revealed Pandoraea sp. 892iso to have the closest phylogenetic relationship with Pandoraea sputorum strain DSM21091. Genome-to-genome distance calculator. In silico genome-to-genome comparison to obtain an estimate of the overall similarity between the genomes of two types has enabled the taxonomist to perform genome-based species delineation and genome-based subspecies delineation. These distance functions can also cope with heavily reduced genomes and repetitive sequence regions. The Genome-to-Genome Distance Calculator (GGDC) calculates the distances by comparing genomes to obtain HSPs (high-scoring segment pairs) and interfering distances from a set of formulas: 1) HSP length/total length; 2) identities/HSP length; and 3) identities/total length [42]. An estimated GGDC of the overall similarity between Pandoraea sp. 892iso and other Pandoraea species is shown in Table 4. In probability DDG �70% index  analysis, the pairwise comparison of the genome with P. sputorum was found to be 98.49%, 96.97%, and 99.88% for the HSP length/total length, identities/HSP length, and identities/total length ratios, respectively. Thus, the close relationship of Pandoraea sp. 892iso and P. sputorum was verified.

Some special genes among Pandoraea sp
Quorum sensing (QS). The most studied QS molecule is N-acyl homoserine lactone (AHL), which is secreted by gram-negative proteobacteria. AHLs are secreted by LuxI homologs until a threshold concentration of AHL is attained before they bind to LuxR homologs and subsequently activate a cascade of QS-regulated gene expression [43]. The predicted putative AHL synthase (ppnI) and AHL receptor protein (ppnR) in Pandoraea sp. 892iso and the nine Pandoraea species are shown in Table 5. The phylogenetic trees of putative AHL synthase (ppnI) and AHL receptor protein (ppnR) are shown in Fig 6. Intrinsic carbapenem-hydrolyzing oxacillinases. Oxacillinases are serine β-lactamases of molecular class D. Many bacterial species could produce OXA-type enzymes, some of them with carbapenem-hydrolyzing activity. The nine Pandoraea-derived oxacillinase genes, named OXA-159, encode 292 amino acids and were found to be new oxacillinase variants [44]. The predicted genes with the function of OXA-159 in Pandoraea sp. 892iso and the nine Pandoraea species are shown in Table 6. The phylogenetic trees of genes with the putative function of OXA-159 are shown in Fig 7.

Conclusions
We sequenced Pandoraea sp. 892iso from the genome of a Phytophthora rubi strain (numbered 109892) and combined the data with existing genomic data for other Pandoraea species. Next, we conducted a comparative genomic analysis of the genome structure, evolutionary relationships, and pathogenic characteristics of Pandoraea species. Our results identified Pandoraea sp. 892iso as Pandoraea sputorum at both the genome and gene levels. At the genome level, we carried out phylogenetic analysis of single-copy, gene co-linearity, ANI and AAI indices, rpoB similarity, MLSA phylogenetic analysis, and genome-to-genome distance calculator calculations to identify the relationship between Pandoraea sp. 892iso and P. sputorum. At the gene level, the quorum sensing genes ppnI and ppnR and the OXA-159 gene were analyzed. It is speculated that Pandoraea sp. 892iso is the endosymbiont of the Phytophthora rubi strain. Project administration: Rui-Fang Gao.